# PROCO

This is the repository for the paper:  **Proactive Cost Generation for Offline Safe Reinforcement Learning Without Unsafe Data**.

## Algorithm projects

### Installation Instructions

1. Create Environment
    
    In our procject, we use different conda environments for PROCO and baselines.
    
    Please first create the conda environments by using:
    
    ```bash
    conda create -n proco_base python=3.8.19
    conda create -n proco python=3.9.17
    ```
    
    Then, install Pytorch and packages in requirements by using:
    
    ```bash
    conda activate proco_base
    pip install -r requirements.txt
    conda activate proco
    pip install -r requirements1.txt
    ```
    
    After that, please enter directions `DSRL` , `FSRL` and `OSRL` and install them respectively in each conda environment by
    
    ```bash
    pip install -e .
    ```
    
2. Download Datasets
    
    First, download the datasets of original OSRL tasks from [http://data.offline-saferl.org/download](http://data.offline-saferl.org/download).
    
    Then, modify the 23rd line in `DSRL/dsrl/offline_env.py`, substitute the 'XXX’ with the real downloaded data root.
    

### PROCO Training

First, activate the conda environment and enter the OSRL folder:

```bash
conda activate proco
cd OSRL
```

Then, train dynamics models by

```bash
bash run_dynamics_model.sh
```

Subsequently, put the path of learned dynamics model into the dict `env2dynamics` in `PROCO_FISOR/launcher/examples/train_offline_safer.py` , `OSRL/examples/train/train_bcql_safer.py` , `OSRL/examples/train/train_coptidice_safer.py` , and `OSRL/examples/train/train_cpq_safer.py` .

Finally, enter the PROCO_FISOR folder and train PROCO by:

```bash
cd PROCO_FISOR
bash run_proco.sh
```

### Baseline Training

For baseline FISOR, please activate proco environment, enter the PROCO_FISOR folder and train by:

```bash
conda activate proco
cd PROCO_FISOR
bash run_fisor.sh
```

For other baselines, please activate proco_base environment, enter the OSRL folder and train by:

```bash
conda activate proco_base
cd OSRL
bash run_{baseline_name}.sh
```

### Combine PROCO with Other Baselines

For combining PROCO with soft constraint offline safe RL baselines, please activate proco_base environment, enter the OSRL folder and train by:

```bash
conda activate proco_base
cd OSRL
bash run_{baseline_name}_proco.sh
```

## Note

The implementation is based on [[OSRL](https://www.offline-saferl.org/)] and [[FISOR](https://github.com/ZhengYinan-AIR/FISOR)] which is open-sourced.